Reward-Weighted Regression Converges to a Global Optimum
نویسندگان
چکیده
Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists sampling batch trajectories using current policy and fitting new maximize return-weighted log-likelihood actions. Although RWR is yield monotonic improvement under certain circumstances, whether which conditions converges optimal have remained open questions. paper, we provide for first time proof that global optimum when no function approximation used, in general compact setting. Furthermore, simpler case with finite state action spaces prove R-linear convergence state-value optimum.
منابع مشابه
Collective Learning Generally Overcomes Local Optima and Converges to the Global Optimum
Local minima represent a major problem for neural network learning procedures. In this article we present a new procedure, collective learning, that leads to improved global convergence. We have tested our procedure on several neural networks and on the multimodal functions proposed by De Jong and Rastrigin. In our tests we have reached a success ratio of 100 %. In addition we give a few remark...
متن کاملEpisodic Reinforcement Learning by Logistic Reward-Weighted Regression
It has been a long-standing goal in the adaptive control community to reduce the generically difficult, general reinforcement learning (RL) problem to simpler problems solvable by supervised learning. While this approach is today’s standard for value function-based methods, fewer approaches are known that apply similar reductions to policy search methods. Recently, it has been shown that immedi...
متن کاملA modification to geographically weighted regression
BACKGROUND Geographically weighted regression (GWR) is a modelling technique designed to deal with spatial non-stationarity, e.g., the mean values vary by locations. It has been widely used as a visualization tool to explore the patterns of spatial data. However, the GWR tends to produce unsmooth surfaces when the mean parameters have considerable variations, partly due to that all parameter es...
متن کاملA WEIGHTED LINEAR REGRESSION MODEL FOR IMPERCISE RESPONSE
A weighted linear regression model with impercise response and p-real explanatory variables is analyzed. The LR fuzzy random variable is introduced and a metric is suggested for coping with this kind of variables. A least square solution for estimating the parameters of the model is derived. The result are illustrated by the means of some case studies.
متن کاملOn rigorous upper bounds to a global optimum
In branch and bound algorithms in constrained global optimization, a sharp upper bound on the global optimum is important for the overall efficiency of the branch and bound process. Software to find local optimizers, using floating point arithmetic, often computes an approximately feasible point close to an actual global optimizer. Not mathematically rigorous algorithms can simply evaluate the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i8.20811